Support for multiple LoRAs with multiple models via BBR using convent… #2

davidbreitgand · 2025-09-15T19:20:23Z

…ional name for LoRA: model-name/lora/lora-name

The LoRA name used by a client: /lora/
The "lora" keyword separator can also be defined in LORA_TAG environment variable
Known issue: It is better to use an "official" OpenAI struct for unmarshalling

… /pkg/epp/datalayer/factory.go:28:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/api/v1

One liner: add COPY api ./api command to resolve import dependency in…

…ional name for LoRA: model-name/lora/lora-name

nirrozenbaum · 2025-09-16T11:22:53Z

pkg/bbr/handlers/request.go

+	logger.V(logutil.DEFAULT).Info("Orig: " + string(orig))
+
+	if idx := bytes.Index(orig, []byte(loraTag)); idx != -1 {
+		lastSlash := bytes.LastIndex(orig[:idx], []byte("/"))


is this assuming adapter name doesn't contain /?

No, I take everything before loraTag as prefix and everything after loraTag as suffix.

nirrozenbaum · 2025-09-16T11:24:59Z

pkg/bbr/handlers/request.go

+				// skip the slash itself by adding +1 so suffix doesn't start with '/'
+				suffix = afterTag[nextSlash+1:]
+				logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix))
+				requestBody.Model = string(suffix)


I think you should update the content-length header when changing anything in the content

elevran

review comments provided

elevran · 2025-09-18T13:25:04Z

config/charts/body-based-routing/values.yaml

    hub: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension
-    tag: main
-    pullPolicy: Always
+    #tag: main


a few comments

while fine for local development, these should not be part of the PR. The charts should continue to work with main.

also for local dev, you want Always (even when running on kind), since you want newly built images to download to the workers (from Kind, not docker)

elevran · 2025-09-18T13:25:27Z

config/manifests/bbr/bbr-multi-model.yaml

+    backendRefs:
+    - name: deepseek-r1
+      group: inference.networking.x-k8s.io
+      kind: InferencePool


missing newline at EoF

elevran · 2025-09-18T13:26:27Z

config/manifests/bbr/bbr-multi-model.yaml

what YAMLs are used by the current guides? Those are not checked in so you created a new one?
If so, might want to refer to the same file in the multi-model guide.

Same applies to other YAMLs in this PR

elevran · 2025-09-18T13:28:17Z

pkg/bbr/handlers/request.go

+	"bytes"
 	"context"
 	"encoding/json"
+	"os"


there's a IGW package for interacting with env, including setting of defaults, casting to types, etc.

elevran · 2025-09-18T13:28:47Z

pkg/bbr/handlers/request.go

 		return nil, err
 	}

+	//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling.


missing space after // in comments.
Applies to multiple places.

elevran · 2025-09-18T13:30:26Z

pkg/bbr/handlers/request.go


+	//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling.
+	//This can be done more efficiently if a full "official" struct by OpenAI is used. In OpenAI v2 it should be ChatCompletionNewParams
+	var raw map[string]json.RawMessage


you could unmarshal from/to map[string]any and not lose the info.
Also, if there's a standard (or commonly used) OpenAI parsing package, suggest PR converting BBR to using that instead of custom code.

elevran · 2025-09-18T13:31:53Z

pkg/bbr/handlers/request.go

+	//Convention: [model-family]/<model-name>/lora/<lora-name>
+	//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name>
+	//Model name in request (the client side): lora-name (no change from before)
+	loraTag := os.Getenv("LORA_TAG") //set via environment


it would be preferred, IMO, to make that change in a separate function and not mutate the current implementation.
also, using a callback would be more aligned with the expectations of readers given the structure and use of plugins in EPP.

elevran · 2025-09-18T13:33:22Z

pkg/bbr/handlers/request.go

+		loraTag = "lora"
+	}
+
+	orig := []byte(requestBody.Model)


might be easier to parse using strings.Split by casting byte array to string.

elevran · 2025-09-18T13:36:15Z

pkg/bbr/handlers/request.go

+				suffix = afterTag[nextSlash+1:]
+				logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix))
+				requestBody.Model = string(suffix)
+				// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved


can you explain the convention in the PR message?
I think you're expecting model to be in the format of <some-base-model-to-dispatch-to> <tag (e.g., /lora)> <model-name-as-known-by-vllm>.
Is that correct?

elevran · 2025-09-18T13:38:36Z

pkg/bbr/handlers/request.go

+				requestBody.Model = string(suffix)
+				// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved
+				modelBytes, merr := json.Marshal(requestBody.Model)
+				if merr != nil {


why not reuse err instead of merr and merr2?

srampal · 2025-09-18T19:08:04Z

pkg/bbr/handlers/request.go

+	//Mutate model name if it contains reserved keyword "lora" indicating that the requested model is lora (served from the same vLLM as the base model and from the same inferencepool)
+	//Convention: [model-family]/<model-name>/lora/<lora-name>
+	//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name>
+	//Model name in request (the client side): lora-name (no change from before)


Is this comment (line 84) accurate ? This is saying the client side has no change from before but I think in this feature you are proposing the client prompt has to have the model name in the format you listed in line 82 which will be a change for most clients I believe.

Also it seems that clients will be required to know the exact backend LORA name which they may not usually know.

davidbreitgand added 3 commits September 14, 2025 20:00

One liner: add COPY api ./api command to resolve import dependency in…

5ff3c5c

… /pkg/epp/datalayer/factory.go:28:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/api/v1

Merge pull request #1 from davidbreitgand/bbr-dockerfile-fix

309637b

One liner: add COPY api ./api command to resolve import dependency in…

Support for multiple LoRAs with multiple models via BBR using convent…

ba041ea

…ional name for LoRA: model-name/lora/lora-name

nirrozenbaum reviewed Sep 16, 2025

View reviewed changes

elevran reviewed Sep 18, 2025

View reviewed changes

srampal reviewed Sep 18, 2025

View reviewed changes

Updated .gitignore, updated manifests, updated Makefile

37e75ad

davidbreitgand force-pushed the main branch from 309637b to dd77248 Compare September 28, 2025 13:39

Support for multiple LoRAs with multiple models via BBR using convent… #2

Are you sure you want to change the base?

Support for multiple LoRAs with multiple models via BBR using convent… #2

Uh oh!

Conversation

davidbreitgand commented Sep 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants